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1 Abstract 

2 Interspecific gene flow is pervasive throughout the tree of life. Although 

3 detecting gene flow between populations has been facilitated by new analytical 

4 approaches, determining the timing and geography of hybridization has remained 
s difficult, particularly for historical gene flow. A geographically explicit 

6 phylogenetic approach is needed to determine the ancestral population overlap. In 

7 this study, we performed population genetic analyses, species delimitation, 

8 simulations, and a recently developed approach of species tree diffusion to infer 

9 the phylogeographic history, timing and geographic extent of gene flow in the 

10 Sceloporus spinosus group. The two species in this group, S. spinosus and S. 

11 horridus, are distributed in eastern and western portions of Mexico, respectively, 

12 but populations of these species are sympatric in the southern Mexican highlands. 

13 We generated data consisting of three mitochondrial genes and eight nuclear loci 

14 for 148 and 68 individuals, respectively. We delimited six lineages in this group, 

15 but found strong evidence of mito-nuclear discordance in sympatric populations of 

16 S. spinosus and S. horridus owing to mitochondrial introgression. We used 

17 coalescent simulations to differentiate ancestral gene flow from secondary contact, 
is but found mixed support for these two models. Bayesian phylogeography 

19 indicated more than 60% range overlap between ancestral S. spinosus and S. 

20 horridus populations since the time of their divergence. Isolation-migration 

21 analyses, however, revealed near-zero levels of gene flow between these ancestral 

22 populations. Interpreting results from both simulations and empirical data 

23 indicate that despite a long history of sympatry among these two species, gene 

24 flow in this group has only recently occurred. 



25 Key words: Mexico, mito-nuclear discordance, Bayesian phylogeography, hybridization, 

26 gene flow, coalescent simulations, species delimitation 
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Introduction 

The topic of hybridization, or gene flow between evolutionary independent 



28 lineages, has captivated evolutionary biologists for nearly two centuries (Darwin 1859 



Harrison 1993). Gene flow between species is common in nature with approximately 



10% and 25% of animal and plant species known to hybridize, respectively (Mallet 



2005). Although hybrid zones have been identified across a variety of organisms 



32 (Abbott et al 2013 Larson et al 2013), determining the temporal and geographic extent 



33 of hybridization has remained a difficult task (Hewitt 2001). 



Analytical advancements in the field of phylogeography have enabled sophisticated 
model-testing approaches, including the ability to test demographic scenarios including 



36 gene flow (Avise 2000; Knowles 2009 Hickerson et al 2010). New phylogeographic 



methods, and Bayesian phylogeography in particular, infer the geographic diffusion of a 
clade over time within a coalescent-based framework and have therefore enabled the 
simultaneous estimation of the spatial and temporal history of individuals and 



populations (Lemey et al 2009, 2010 Nylinder et al 2014). Whereas the initial 



implementation of Bayesian phylogeography required discretized areas (e.g., countries) 



and assumed a time- homogeneous process of geographic diffusion (Lemey et al 2009) 



recent modifications have enabled the analysis of continuous geographic data (e.g., 
latitude /longitude coordinates) and heterogeneous geographic diffusion rates amongst 



individuals, and most recently, amongst species (Nylinder et al 2014). However 



examining species-level phylogeography requires an accurate knowledge of the species 
limits. But species limits, particularly within closely related groups of species in the 



tropics, are often unknown(e.g., Barley et al 2013). Identifying species in an objective 
manner is requisite to defining groups for species-level phylogeographic analysis. 

The timing of sympatry or allopatry amongst ancestral ranges of closely related 
lineages can be determined by applying absolute dates to phylogeographic analyses. 
Knowing this information is of primary concern when comparing phylogeographic 
models of divergence with gene flow vs. a model of secondary contact. For instance, two 
species that presently have overlapping distributions might be assumed to be in 
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secondary contact if the ancestral ranges of the species were allopatric (Pettengill & 



Moeller 2012). Similarly, determining colonization times in areas of hybridization can 



[help define times of population expansion when testing models of gene flow (e.g., Smith 



et al 2011). And finally, understanding the geographic and temporal occurrence of 



59 particular clades along with the geologic history of the study region can help elucidate 



eo the biogeographic mechanisms shaping phylogeographic patterns (e.g., Chiari et al 



2012). 



Coalescent simulations are a valuable tool for testing alternative phylogeographic 



scenarios (e.g., Knowles 2001 Kuhner 2009 Pelletier & Carstens 2014). Modeling 



64 genetic variation within a coalescent framework enables quantitative tests of alternative 



65 population histories and the estimation of population genetic parameters (e.g., Hudson 



2002). The modeled population histories are often generated based on inferences 



obtained from geological data (Carstens et al 2005), paleoclimatic data (Spellman & 



Klicka 2006), or based on previous genetic studies (Tsai h Carstens 2013), and the 



69 parameterizations used in the models can be derived from estimates from empirical data 



70 (Carstens et al 2005). The parameter estimates based on the empirical data are then 



compared to the distribution of simulated values, allowing for the acceptance or 
rejection of each hypothesis. In such a way, a vast majority of phylogeographic models 
otherwise indistinguishable when only utilizing empirical data can be reduced to a 



reasonable set of candidate models (e.g., Pelletier k, Carstens 2014). 



In this study, we examined temporal and geographic patterns of gene flow to 
investigate the phylogeographic history of the Sceloporus spinosus group. The S. 



spinosus group consists of two species, S. spinosus and S. horridus (Wiens & Reeder 



1997 Smith & Chiszar 1992) that are broadly distributed throughout xeric habitats in 



Mexico (Smith 1939; Cole 1970 Frost 1978). Each species is composed of three 



subspecies: S. s. spinosus, S. s. apicalis, and S. s. caeruleopunctatus, and S. h. 



horridus, S. h. albiventris, and S. h. oligoporus (Frost 1978; Smith & Chiszar 1992). 



Sceloporus spinosus is primarily found in and near the (eastern) Sierra Madre Oriental 
mountain range, whereas S. horridus is largely distributed in the lower slopes of the 
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(western) Sierra Madre Occidental mountain range. Similarities in the habitat 
preferences of these two species have led to areas of sympatry, and suspected 
hybridization, in southern Mexico (Fig. 1). 

In addition to sharing habitat preferences, S. spinosus and S. horridus also share 
similar morphologies, thus making "attempts at determining phylogenetic relationships 
among spinosus group species on the basis of classical characteristics of scutellation and 



90 color pattern considerably frustrating" (Cole 1970). In fact, previous researchers have 



9i proposed contact zones in Puebla and Oaxaca to explain the observed morphological 



92 overlap in traits (Frost 1978 Boyer et al 1987). Beyond identification of species 



distinguishing between subspecies has also proven to be difficult. For instance, overlap 



94 in quantitative characters exists between 5". spinosus subspecies (Smith & Chiszar 



1992), and intergradation has also been suspected between many of the subspecies (S. s. 



96 spinosus x S. s. apicalis, S. s. apicalis x S. s. caeruleopunctatus, and S. h. albiventris x 



97 S. h. oligoporus) (Frost 1978; Smith h Chiszar 1992) 



We aim to determine the temporal and geographic extent of overlap between S. 
spinosus and S. horridus with multi-locus nuclear DNA (nDNA) and mtDNA. We first 
used population assignment and species delimitation analyses to identify the number 
and geographic boundaries of distinct populations within each species, and then inferred 
phylogenetic trees for the nDNA and mtDNA data. We then performed coalescent 
simulations to model potential historic phylogeographic scenarios that could have 
generated the strong pattern of mito-nuclear discordance that we observed in the 
empirical data. In addition to testing models of divergence with gene flow and 
secondary (2°) contact, we utilized a new Bayesian phylogeographic approach that 



estimates the diffusion of populations through time (Nylinder et al 2014). This 



approach provided us with temporal and spatial information for discriminating between 
models of divergence with gene flow vs. 2° contact. 



Materials & Methods 
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Taxon Sampling 

One hundred fourty-eight individuals were sampled across the distributions of S. 
horridus and S. spinosus (Fig. 1; Supplemental Table 1). Four samples of S. 
edwardtaylori were included for analysis because recent work (using combined n- and 



mtDNA) showed that this taxon is nested within the S. spinosus group (Leache 2010 



Wiens et al 2010). Assignment of individuals to species and subspecies was based on 



us morphological character descriptions by Smith (1939), Smith & Smith (1951), Frost 



H9 (1978), and Smith & Chiszar (1992). Of these 152 individuals, 81 yielded nuclear 



sequence data. However, after data refinement (see below), a total of 70 individuals, 
including two S. edwardtaylori individuals, were represented in the nuclear DNA 
(nDNA) analyses. Both S. horridus and S. spinosus were nearly equally represented in 
the nDNA dataset (Supplemental Table 2). Three individuals of Sceloporus clarkii were 
included in our dataset to serve as the outgroup for phylogenetic analyses. 

Molecular Data Collection 
Genomic DNA was extracted from tissue using the Qiagen extraction kit. A total 
of three mtDNA regions and eight nDNA loci were targeted for sequencing and analysis; 
five of the nDNA regions are protein-coding (BACH1, EXPH5, KIAA_2018, NKTR, and 
R35), one region is intronic (NOS1) and two are anonymous loci. We sequenced 
portions of the mitochondrial genes encoding the fourth unit of the NADH 
dehydrogenase (ND4, and adjacent genes encoding the tRNAs for histidine, serine, and 



132 leucine; Arevalo et al 1994), the 12S ribosomal gene (Leache & Reeder 2002), and 



133 cytochrome B (Kocher et al 1989). 



Standard PCR protocols were used to amplify mitochondrial DNA (mtDNA), 
whereas a "touch-down" protocol was used to amplify the nDNA regions (94° C for 
1:00, [0:30 at 94° C, 0:30 at 61° C, 1:30 at 68° C] x 5 cycles, [0:30 at 94° C, 0:30 at 59° 
C, 1:30 at 68° C] x 5 cycles, [0:30 at 94° C, 0:30 at 57° C, 1:30 at 68° C] x 5 cycles, and 
[0:30 at 94° C, 0:30 at 50° C, 1:30 at 68° C] for 25 cycles). Diploid nuclear genotypes 



139 were phased using the program PHASE (Stephens et al 2001) where alleles were 
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wo discarded if any site probability was <0.95 (resulting in <20% data reduction). We 



Rested for intragenic recombination using the difference in sum-of-squares test (McGuire 



& Wright 2000) in TOPALi (Milne et al 2009) using a step-size of lObp and a window 



size of lOObp for 500 parametric bootstraps. 



Population Assignment 
We explored two methods to identify distinct populations within the S. spinosus 
group that utilize multi-locus nDNA data and require no a priori knowledge of 
population assignment or number of populations. We used the Bayesian program 



STRUCTURAMA (Huelsenbeck & Andolfatto 2007; Huelsenbeck et al 2011) to identify 



the number of populations (k) present in our data. To ensure that the posterior 
distribution was not sensitive to the prior mean value for k, we chose the prior value to 
range between 1 and 10, assuming no admixture between populations (assuming 
admixture resulted in unstable results, where more populations were inferred than 
individuals in our dataset). We ran four replicates of each STRUCTURAMA analysis 
for a length of 10 6 generations and a burn in of 2.5xl0 5 (analyses ran for 2xl0 6 , 5xl0 6 , 
and lOxlO 6 produced similar results; results not shown). We present results as the 
arithmetic mean of the four replicate analyses. 

To estimate the number of populations within a geographic context, we used the 
program Geneland flGuillot et a/1|2005a||b] |Guillot|[2008| ). This program uses a spatial 
statistical model and Markov chain Monte Carlo sampling with GPS coordinates and 
multi-locus genotypes to estimate the number of populations, individual assignment 
probabilities, and the geographic limits between populations that are in 
Hardy- Weinberg equilibrium. We varied the number of populations from 1-10 with a 
spatial correlation between allele frequencies, and ran five independent analyses with 
the same parameters for 10 6 generations. We modified the format of the Geneland 



output files and combined the results with the program CLUMPP (Jakobsson & 



Rosenberg 2007) to generate individual assignment probabilities. 
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Phylogenetic Tree Estimation 
we We estimated maximum likelihood phylogenetic trees for each nDNA locus in 

169 addition to concatenated nDNA and mtDNA datasets separately to examine the 



no concordance in evolutionary history between these genomes. RAxML (Stamatakis 2006) 

171 was used with the GTR + T nucleotide substitution model and run for 500 

172 nonparametric bootstrap iterations for both n- and mtDNA analyses, where one out of 

173 two alleles was randomly chosen to represent each individual in the concatenated nDNA 

174 analysis. Partitioning the data by gene vs. codon position did not affect topology or 

175 branch length estimates, so we present results from partitioning by codon position. We 

176 ran two replicates of each analysis to ensure stability of our results. Phylogenetic 



177 Relationships were considered significant when bootstrap (bs) values were > 70% (Hillis 



& Bull 1993 Alfaro et al 2003). 



Bayes Factor Delimitation of Species (BFD) 
To delimit evolutionarily independent lineages, we performed Bayes Factor 



181 Delimitation of species (BFD) using only the nDNA dataset (Grummer et al 2014). 

182 Our species delimitation models were based on a combination of the results from 

183 population assignment and migration analyses. Gene flow violates the coalescent model 



184 used in the species tree estimation program *BEAST (Heled & Drummond 2010), so we 

185 performed species delimitation on two distinct datasets in an effort to remove 

186 potentially admixed individuals located on population margins. One dataset consisted 

187 of individuals limited to the "core" range of each population as determined in Geneland 

188 (see Results section below), whereas the other dataset consisted of all individuals (Fig. 

189 3). Our expectation was that the dataset consisting of all individuals would be more 

190 likely to support the recognition of fewer species because gene flow between populations 

191 homogenizes gene pools and makes divergent populations appear as one. Six species 

192 delimitation models were tested with each dataset: 1) the six-population model where 

193 each population based on population assignment analyses was distinct (the "6 pop" 

194 model), 2) a model of five species where the northern and central populations of S. 
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195 horridus were lumped together (the "northern horridus migration" model), 3) a second 

we five-species model where central and southern populations of S. horridus were lumped 

197 together (the "southern horridus migration" model), 4) a third five-species model with 

198 central and southern populations of S. spinosus lumped together (the "southern 

199 spinosus migration" model), 5) a four-species model with all populations of S. horridus 

200 lumped together (the "all horridus migration" model), and lastly 6) a two-species 

201 model where the three populations of each S. horridus and S. spinosus are represented 

202 as a single species (the "2 pop" model). Models 2-5 are based on "lumping" lineages 

203 together that were inferred to have non-zero migration rates between them (see Results 

204 below). We ran *BEAST with the same settings as in our Bayesian phylogeographic 

205 analyses (below), and selected the best species delimitation model through Bayes factor 

206 (Bf) analysis of the path sampling ("PS") and stepping stone ("SS") marginal 



likelihood estimates (Baele et al 2012). A model is considered significantly better than 



208 the rest if the Bf value is greater than 10 (Kass & Raftery 1995). 

Genealogical Sorting Index 

209 We performed simulations to discern whether gene flow occurred amongst 

210 ancestral (i.e., divergence with gene flow) or extant populations (i.e., secondary 

2n contact). To determine when gene flow occurred in the S. spinosus group, we used the 

212 genealogical sorting index (gsi). The gsi is a statistic that estimates the degree of 

213 exclusive ancestry of individuals in labeled groups on a rooted tree and is a statistically 



2w more powerful measure of population divergence than F$t (Cummings et al 2008). The 

215 gsi statistic can range from 0 to 1, where the maximum value of 1 is achieved when a 

216 group is monophyletic, and is normalized to account for disparities in group sizes while 

217 also accommodating unresolved relationships (i.e., polytomies). Although genealogical 

218 exclusivity is a function of the sorting of ancestral polymorphisms, allele sharing could 

219 also be due to the extent and timing of migration events. We therefore modeled 

220 migration scenarios and performed coalescent simulations to test models of divergence 

221 with gene flow vs. 2° contact, which have explicit expectations about the timing of 

222 migration events. 
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Coalescent simulations were performed in the program MCcoal (Rannala & Yang 



2003 Yang & Rannala 2010). In our simulations, we used a symmetric migration 



matrix and held the migration rate constant at 1 N e m (0.5 N e m in each direction), but 
varied the migration start and end times (Fig. 2). Divergence times and population 
sizes used in the simulations were derived from estimates of our empirical data in the 



228 programs BP&P (Yang & Rannala 2010) and Arlequin v3.5 (Excofher & Lischer 2010), 



respectively. We simulated species trees including no gene flow (Scenario A; Fig. 2a), 
ancestral gene flow between the common ancestors of S. horridus and S. spinosus 
(Scenario B; Fig. 2b), gene flow between ancestral populations as well as contemporary 
gene flow between one S. horridus and two S. spinosus lineages (lineages selected based 
on empirical results, see Results; Fig. 2c), gene flow between the common ancestors of S. 
horridus and S. spinosus, followed by a cessation of gene flow until contemporary gene 
flow between three lineages as above (Scenario D; Fig. 2d), and contemporary gene flow 
between one lineage of S. horridus and two lineages of S. spinosus (Scenario E; Fig. 2e). 
We restricted our simulations of gene flow to these models because the mtDNA clade 
showing admixture was comprised only of individuals from these three populations. 

We simulated 10,000 gene trees under each model, then calculated a gsi value for 
each group within each gene tree in the "genealogicalSorting" R package (using the 
"multitree" function). We focused empirical gsi calculations on the mtDNA locus in an 
attempt to resolve the putative pattern of interspecific gene flow. To account for 
phylogenetic uncertainty in the empirical data, we calculated the single ( "ensemble" ) 
gsi value for the mtDNA for each population on a posterior distribution of 8,000 trees 
inferred in MrBayes (v3.2; Ronquist and Huelsenbeck 2003[ ). For the MrBayes analysis, 
we partitioned the dataset by codon for protein-coding genes (one 12S partition, three 
partitions each for CytB, and four partitions for ND4 including the tRNA coding 
sequence) and assigned each the best substitution model determined in jModelTest v2 



(Darriba et al 2012; Guindon & Gascuel 2003). We ran two analyses for 10 7 



generations, sampling every 2000 steps, and discarded the first 20% as burn-in 



25i (determined by visual examination in Tracer vl.5 Rambaut & Drummond 2007). 
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252 To assess the probability that the empirical mtDNA gsi values are different from 

253 the gsi values from the simulated trees, we calculated the frequency of simulated gsi 

254 values that were in the tail of the distribution beyond the empirical value. These values 

255 could therefore be interpreted as one-half of the p-value statistic when testing the null 

256 expectation that the empirical mtDNA gsi values were drawn from the simulated gsi 

257 distribution. The comparison of empirical mtDNA gsi values to the simulated gsi values 

258 provide a statistical test of determining the timing of migration events in the S. 

259 spinosus group. 

260 

Estimation of Nuclear Gene Flow 
26i We estimated ancestral and contemporary levels of gene flow in the program IMa2 



262 (Hey 2010) using our empirical nDNA. This program estimates bi-directional and 

263 uni-directional migration rates, divergence times, and population sizes. The IM model 

264 assumes non-recombinant loci, constant population sizes, and that population-level 

265 sampling has been performed randomly. We performed analyses on three separate 

266 datasets, where the user-specified topologies were based on our empirical species tree 

267 estimate (see below): (1) only S. horridus populations (=3 extant populations), (2) 

268 only S. spinosus populations (=3 extant populations), and (3) both S. horridus and S. 

269 spinosus (=6 extant populations). For the three-population models, we specified 3xl0 5 

270 steps as burn-in with 3xl0 5 steps following burn-in, and allowed the program to infer 

271 migration rates amongst all pairwise lineage combinations (including ancestral gene 

272 flow). For the 6-population model, the burn-in period lasted for 5xl0 5 generations 

273 followed by 3xl0 5 steps post burn-in, and we estimated migration between all pairwise 

274 lineage combinations (including ancestral gene flow). Whereas the three-population 

275 models allowed us to examine gene flow between populations within each species 

276 (including ancestral gene flow), the 6-population model enabled us to test for gene flow 

277 across species (both extant and ancestral lineages). For all models, we ran four 

278 replicate analyses (using different starting seeds) of 100 chains with heating terms of 

279 0.98 and 0.90 (options -ha and -hb). Significant levels of migration were assessed using 
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the Nielsen & Wakeley (2001) test implemented in IMa2. 



Bayesian Phylogeographic Analysis 



We utilized Bayesian phylogeography (Lemey et al 2009, 2010) to determine the 



temporal and geographic extent of overlap, and therefore the possibility of introgression, 
between the populations comprising the "admixed" mtDNA clade. We utilized a 



285 method that was recently developed by Nylinder et al (2014) that applies the relaxed 



random walk (RRW) continuous phylogeographic approach (Lemey et al 2010) to relax 
the assumption of geographic rate diffusion homogeneity across branches in the species 



^ree. This method follows a two-tiered approach in the program BEAST (Drummond & 



Rambaut 2007) where a posterior distribution of species trees is first generated, which is 



290 then subsequently used in an RRW analysis. 



To generate the species tree, we used *BEAST vl.7.5 (Heled & Drummond 2010) 



on the 8-locus nuclear dataset, with individuals assigned to lineages based on our 
population assignment and BFD results (see Results section below). The species tree 
analysis only included individuals that did not show signs of admixture (i.e., we only 
included individuals with >0.90 posterior probability for belonging to one population). 
We calibrated the root of the (S. spinosus group + S. edwardtaylori) clade at 5.0 million 
years ago (mya) with a standard deviation of 0.5, based on the time-calibrated tree 



from Leache & Sites Jr (2010); this allowed us to place dates on the phylogeographic 



events within this group. Each gene was given its own partition and analyzed under the 
uncorrelated lognormal molecular clock with the preferred substitution model as 
mentioned above. Analyses were run for 3xl0 8 generations, logging every 2xl0 4 steps, 



and convergence was assessed in Tracer vl.5 (Rambaut & Drummond 2007) 



The species tree diffusion analysis was performed with BEAST vl.8.1. We used 
LogCombiner vl.8 from the BEAST package to combine and thin results from three 
independent species tree analyses. After pruning S. edwardtaylori in the program 



Mesquite (v2.75; Maddison and Maddison 2011), one thousand species trees from the 
posterior distribution were then used as input for the species tree diffusion analysis. We 
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circumscribed polygons in Google Earth to approximate extant distributions for each 



309 lineage/population based on published range maps (Smith 1939 Frost 1978 Smith & 



Chiszar 1992) and Geneland results; these polygons were then referenced along with the 



311 posterior distribution of species trees for analysis. We explored the effect of 

312 (geographic) starting location on species-level geographic diffusion by choosing two 

313 different starting locations within each species' boundaries. All priors on the RRW 

314 diffusion model were kept the same as in Nylinder et al. (2014). We ran four 

315 independent replicates of species tree diffusion analysis for 5xl0 8 generations each, 

316 logging every 5xl0 5 generations. The "time slice" function of the program SPREAD 



317 (Bielejec et al 2011) was then used to visualize the ancestral 80% HPD regions in 

318 Google Earth at 5xl0 5 year intervals from 3.0 - 0.5mya. All files used for Bayesian 

319 phylogeographic analysis are available online as supplementary materials. 

320 

Results 

Taxon Sampling 

321 We generated mtDNA data for 74 S. horridus, 74 S. spinosus, and four S. 

322 edwardtaylori. Our nDNA dataset consisted of a subset of the individuals present in the 

323 mtDNA dataset: 36 S. horridus, 32 S. spinosus, and two S. edwardtaylori. All 

324 individuals in the mtDNA dataset were amplified for at least one of the three 

325 mitochondrial regions examined, whereas the final nDNA dataset only consisted of 

326 individuals with sequence data for > 4 loci (> 50% complete matrix). 

327 

Molecular Data Collection 

328 The three mtDNA regions totaled 2,639bp with 859 variable sites, 714 of which 

329 were parsimony-informative (Table 1; GenBank accession nos. xxxx-xxxx). In contrast, 

330 the eight nDNA regions totaled 5,716bp with 459 variable sites and 420 

331 parsimony-informative sites (GenBank accession nos. xxxx-xxxx). Large indels (>10bp) 

332 were present in the intron (NOS1) and two anonymous loci (Sun_035, Sun_037), but 

333 these were not scored for usage in the phylogenetic analyses. No evidence of intra-genic 
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334 recombination was detected in any gene. 

335 

Population Assignment 

336 STRUCTURAMA analyses indicated the highest posterior probability resulted 

337 when partitioning individuals into six populations when the prior on k was <6 (Table 

338 2). When the prior mean on k was >7, seven populations were inferred in the S. 

339 spinosus group, indicating some sensitivity of our analysis to the prior distribution on 

340 k. Geneland results provided strong support for six distinct populations where this 

341 model (k=6) received >0.65 of the posterior probability of k values between 1 and 10 

342 across replicate analyses (results not shown). Three of these populations were composed 

343 of S. horridus individuals, and the other three populations were composed of S. 

344 spinosus individuals (Fig. 3). Proportions of population assignment based on Geneland 

345 output are shown in Figure 1. Nearly all individuals (65/68) showed >0.95 probability 

346 in belonging to a single cluster. The geographic boundaries of the populations inferred 

347 in Geneland are largely in agreement with currently recognized subspecific boundaries. 

348 

Phylogenetic Tree Estimation 

349 Phylogenetic trees for six out of the eight nDNA loci revealed moderate to strong 

350 support for the monophyly of one species to the exclusion of the other, whereas the 

351 remaining two loci showed some degree of species-level paraphyly (Supplemental Fig. 

352 1). Support values towards the tips of the trees (i.e., between alleles) were generally 

353 low. The position of S. edwardtaylori was variable across gene trees. The concatenated 

354 nDNA tree revealed strong support (bs = 100) for the sister relationship between S. 

355 edwardtaylori and (S. spinosus + S. horridus) (Figs. 1,4; Supplemental Fig. 2). The 

356 support for mutual exclusivity between S. spinosus and S. horridus was strong with bs 

357 values of 99 and 100 for each group, respectively. The nDNA was geographically 

358 structured with strongly supported clades in general agreement with currently 

359 recognized subspecific geographic boundaries (Fig. 4). However, it is important to note 

360 that not all populations inferred in our population assignment tests appear as natural 
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361 groups in the nDNA concatenated tree, specifically, central S. horridus and southern S. 

362 spinosus (Fig. 4). 

363 The (S. edwardtaylori, (S. spinosus, S. horridus)) relationship inferred with the 

364 nDNA is in stark contrast to the relationships inferred with the mtDNA. In the mtDNA 

365 tree, both S. spinosus and S. horridus were paraphyletic, with S. edwardtaylori nested 

366 within these two species with strong support (Figs. 1,4; Supplemental Fig. 3). The 

367 mtDNA tree also shows two clades of S. spinosus and two clades of S. horridus, in 

368 addition to one moderately supported clade consisting of both S. spinosus and S. 

369 horridus individuals. Interestingly, the S. spinosus and S. horridus individuals in this 

370 "admixed" clade occur in southern Mexico where these two species are sympatric (Fig. 

371 4). Although these putatively admixed individuals form a clade in the mtDNA tree, 

372 they belong to three distinct populations in the nDNA, specifically, central S. spinosus, 

373 southern S. spinosus, and southern S. horridus (Fig. 4). This phylogeographic result 

374 was interpreted as geographically localized mitochondrial introgression, as incomplete 

375 lineage sorting in the mtDNA would be expected to not leave a strong geographic 

376 signature. We therefore performed coalescent simulations with gene flow and used the 

377 gsi statistic to determine the timing of this admixture. 

378 

Species Delimitation 

379 Marginal likelihood estimates based on both PS and SS marginal likelihood 

380 estimators were very similar, and the ranking of models was identical, so we therefore 

381 only show the PS results. Out of the six species delimitation models examined, the 

382 model containing six species (corresponding to the six populations identified through 

383 population assignment analyses) was favored over all other models by a Bf > 70 (Table 

384 3) . This result was consistent across both datasets composed of all samples and "core" 

385 samples. These results did not match our expectation, given that non-zero levels of gene 

386 flow were detected between three population-pairs (see "Estimation of Nuclear Gene 

387 Flow" results below). The "2 pop" model that represented S. horridus and S. spinosus 

388 each as a single species composed of three populations was the lowest ranked model in 
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389 both datasets, indicating the strong possibility that currently described subspecies may 

390 warrant the recognition as distinct species. 

Genealogical Sorting Index 

391 We focus our gsi results on the central S. spinosus, southern S. spinosus, and 

392 southern S. horridus populations (and their ancestors), because these populations 

393 appeared to be admixed in the mtDNA tree and therefore were the populations in 

394 which we modeled gene flow (see Figs. 2,4). When gene flow was not modeled in our 

395 simulations, gsi values were relatively high (all values > 0.66; Scenario A; Table 3), i.e., 

396 relatively high levels of monophyly within populations. The gsi values reported for 

397 Scenario B, which included only historic gene flow between the common ancestors of S. 

398 horridus and S. spinosus (and therefore represents the model of divergence with gene 

399 flow), were similar to (but all less than) those reported for the model with no gene flow 

400 (Scenario A; Table 3; Supplemental Fig. 4), indicating that the gsi index did not do 

401 well at detecting ancestral gene flow. When migration amongst extant populations was 

402 included in the model (e.g., Scenarios C-E), gsi values markedly decreased (Table 3; 

403 Supplemental Fig. 4), particularly for the populations in which migration was modeled, 

404 demonstrating that the gsi statistic does much better at detecting recent gene flow, as 

405 opposed to ancestral gene flow. 

406 Based on the empirical mtDNA data, central and southern populations of S. 

407 horridus along with the central S. spinosus population returned the lowest gsi values (< 

408 0.55; Table 3), whereas gsi values for the other populations were all > 0.90 (Table 3). 

409 According to our test statistic, the probability that southern S. horridus had a history 

410 similar to those modeled by Scenarios A and B is very low (0.0002, and 0.003, 

411 respectively), meaning that this population experienced appreciable levels of ancestral 

412 gene flow (> lN e m; Fig. 2; Table 3). However, there is strong probability that central 

413 and southern S. spinosus populations match the history of Scenarios A and B (all 

414 p>0.09 for rejecting these scenarios), meaning they experienced negligible levels of 

415 ancestral gene flow (<lN e m; Table 3). The empirical mtDNA gsi values for southern S. 

416 horridus and central S. spinosus populations strongly matched the simulated 
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417 distribution values (all p>0.11 for rejecting these scenarios) when gene flow was 

418 modeled amongst extant lineages (Scenarios C-E; Figs. 2,5; Table 3). However, the 

419 empirical gsi value for the southern S. spinosus population did not fit the expected 

420 distribution of simulated gsi values (p<0.03 for rejecting these scenarios) resulting from 

421 these same scenarios modeling recent gene flow (Fig. 5; Table 3). 

422 

Estimation of Nuclear Gene Flow 

423 Although the 3-population models are nested subsets of the 6-population model, 

424 the IMa2 results were inconsistent between these analyses (Table 4). Significant levels 

425 of unidirectional gene flow were detected within S. horridus, from northern S. horridus 

426 into southern S. horridus, from southern S. horridus into central S. horridus, and 

427 historically, between the common ancestor of northern and central populations S. 

428 horridus populations with the southern S. horridus population (Table 4). Within S. 

429 spinosus, significant levels of gene flow were detected from the central S. spinosus 

430 population into southern S. spinosus, and historically, from northern S. spinosus into 

431 the common ancestor of central and southern S. spinosus populations (Table 4). The 

432 full 6-population model allowed us to test for gene flow between S. horridus and S. 

433 spinosus. In terms of migration across species, a significant migration rate was reported 

434 from southern S. horridus into northern S. spinosus, a result coincident with a scenario 

435 of interspecific mitochondrial introgression (Table 4). 

436 

Bayesian Phylogeographic Analysis 

437 Only one (S. h. horridus) individual was removed for the species tree analysis due 

438 to an admixed genotype (Fig. 1). The time-calibrated species tree revealed a root age of 

439 3.1 mya (1.55-5.61 95% C.I.) for the S. spinosus group (results not shown). Altering the 

440 starting coordinates for each population did not appear to have an affect on our species 

441 tree diffusion analyses. At 3.0 and 2.5 mya, the distributions of the common ancestors 

442 (CA) of S. horridus and S. spinosus were largely sympatric in southern Mexico (Fig. 

443 6). Sceloporus horridus split into two lineages at 2.1 mya, where southern S. horridus 
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was nearly 100% sympatric with the S. spinosus CA. At 1.5 mya, southern S. horridus 
had moved slightly to the east and shares less range overlap with the CA of S. spinosus. 
By 1.0 mya, southern S. horridus and the CA of central and southern S. spinosus 
populations overlap with each other by approximately 60%. At 0.5 mya, the central S. 
spinosus population is nearly 100% sympatric with the southern S. spinosus population, 
and southern S. horridus is sympatric in the east with both central and southern 
populations of S. spinosus (Fig. 6). These results indicate that all populations present 
in the admixed mtDNA clade were largely sympatric throughout their existence until 
the past < one million years, at which point populations began diverging in allopatry. 

Discussion 

Recent analytical advancements in gene flow detection have given researchers the 
ability to utilize multi-locus datasets to estimate migration not only amongst extant 



lineages, but also between ancestral lineages (e.g., Hey 2010). Similarly, 



phylogeographic analyses can be tested in a statistical framework (e.g., Chan et al 2011 



Pelletier & Carstens 2014). And recently, species trees, as opposed to gene trees, have 



459 become the currency of some phylogeographic approaches (Nylinder et al 2014). 



However, identifying the extent of historic geographic overlap and/or separation of 
lineages, parameters critical to differentiating between secondary contact and divergence 



with gene flow, has remained difficult (e.g., Pettengill & Moeller 2012). In this study. 



we employed phylogeographic and coalescent-based simulation approaches to determine 
two parameters that are often difficult to infer, particularly for ancestral lineages: the 
timing and geographic extent of gene flow. 



Phylogeography of the S. spinosus Group 
A number of phylogeographic studies have been performed in Mexico due to its 



468 rich orogenic history (e.g., Devitt 2006 Bryson et al 2011a Bryson Jr et al 2012a 



Leache et al 2013), and many studies have found that the major mountain ranges 



470 (Sierra Madre Occidental, western Mexico; Sierra Madre Oriental, eastern Mexico; 
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Trans-Mexican Volcanic Belt, southern-central Mexico; Sierra Madre del Sur, southern 
Mexico) have had major effects on the biogeographic patterns across many taxonomic 



groups (e.g., Bryson Jr et al 2012b Ruiz-Sanchez & Specht 2013). On the other hand, 
some researchers argue that some of these features do not represent single biogeographic 



475 entities (e.g., Corona et al 2007). Although the extant distribution of S. spinosus group 



taxa is similar to other species (e.g., Phrynosoma orbiculare; Bryson Jr et al 2012), 
subtleties in habitat (and therefore elevational) preferences result in a unique 
phylogeographic distribution across Mexico for this group, particularly in the geographic 



overlap of distinct populations in southeastern Mexico (but see Fernandez 2011). 



Population assignment and species delimitation analyses identified six independent 
lineages within the S. spinosus group (Fig. 3; Table 2); geographic distributions largely 
coincide with the ranges of subspecies (Figs. 1,3). The geographic boundaries of these 
lineages appear to be strongly influenced by the geology of the region. In southwestern 
Mexico, the Rio Santiago, Rio Ahuijullo, and the western portion of the Balsas basins 
form the interface between S. horridus populations 1 and 2 (Fig. 3). These barriers 
have also been implicated in lineage divergence of horned lizards {Phrynosoma; 



Bryson Jr et al 2012b) and rattlesnakes {Crotalus; Bryson et al 2011a). Similarly, the 



Trans-Mexican Volcanic Belt corresponds to the north-south barrier separating 
northern and central S. spinosus populations. That this geologic feature is a natural 
barrier causing population differentiation is no surprise, as many peaks in this range are 



>5000m and habitats are widely varied (Marshall & Liebherr 2000). The low elevation 



valleys between the Trans-Mexican Volcanic Belt and Sierra Madre del Sur in 
northwestern Oaxaca and eastern Puebla likewise seem to be isolating southern 



populations of S. spinosus, a pattern seen in other lizard species (Bryson & Riddle 



2012). 



The time-calibrated species tree indicated that the common ancestor of S. 
spinosus and S. horridus diverged approximately 3.1 mya (Fig. 6). This is in agreement 



with Cole's (1970) hypothesis that these two species originated in the late Pliocene. 



499 Since this time, Mexico has gone through a number of glacial and pluvial (precipitation) 
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cycles causing range expansions and contractions and population coalescence and 



divergence of many species (Hewitt 2004). Ancestral S. spinosus and S. horridus 



502 populations were isolated to the Central Mexican Plateau and western slope of the 



Sierra Madre Occidental, respectively, likely due to Pleistocene glacial cycles (Riddle & 



Hafner 2006). Following separation, pluvial climates allowed the northern and central 



505 populations of S. horridus to be "in more-or-less continuous contact with each other" 



see (Frost 1978). 



The prolonged extent of geographic overlap between ancestral lineages of S. 
horridus and S. spinosus provided ample opportunity for genetic exchange between 
these lineages. However, our simulation results showed that little to no ancestral gene 
flow occurred in this region (for two out of three lineages modeled; Table 3), which 
refutes the model of divergence with gene flow. The lack of ancestral gene flow, in spite 
of our phylogeographic results, could be for a few reasons. First, the ancestral locations 
of these lineages was incorrectly reconstructed. The method of species tree geographic 



5w diffusion is new (Nylinder et al 2014) and has not been tested under simulation, and we 



are therefore unaware of any inaccuracies it may have. Furthermore, the ancestral 
locations the method is allowed to explore are limited to the geographic extent of extant 
distributions (or however else the researcher chooses to draw the population-delimiting 
polygons prior to analysis). Simulations and further empirical studies must be 
performed with this method to determine its accuracy. Secondly, individuals within the 
reconstructed ancestral ranges may have been occupying the (small) regions 
allopatric/parapatric to the other species. This is possible, however, not likely, as the 
regions in allopatry are peripheral and small in comparison with each lineages' entire 
range. Third, although ancestral S. spinosus and S. horridus may have been broadly 
sympatric, they may have not been syntopic. Both species currently inhabit mostly 



xeric habitats, but show different microhabitat preferences (Cole 1970), meaning they 



simply may have not historically come into contact. And lastly, perhaps species-specific 
recognition cues were more pronounced due to reinforcement as ancestral populations 



diverged. Frost (1978) noted a northwest-southeast cline in S. h. albiventris / S . h. 
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529 oligoporus populations for some external morphological characters (e.g., color 

530 patterning) that he posited was due to reinforcement at the subspecific boundary. Such 

531 a situation could be a strong barrier to ancestral gene flow. 

532 The phylogeographic model of 2° contact is the most likely given our results, in 

533 concert. The simulation modeling 2° contact (Scenario E; Fig. 2) fit the empirical data 

534 for southern populations of both S. spinosus and S. horridus, although the empirical 

535 data for southern S. spinosus (population 3) did not fit the results from this scenario. 

536 Only one U S. spinosus south" individual was recovered in the admixed mtDNA clade, 

537 potentially indicating a low level of gene flow that did not match the simulations 

538 modeling a higher migration rate for this taxon. The split of the common ancestor of 

539 southern S. spinosus populations into its daughter lineages did not occur until around 

540 860,000 years ago. After this point, the ranges of southern S. horridus and S. spinosus 

541 shared a moderate amount of range overlap in southern Mexico where much of the 

542 admixed mtDNA clade is situated (Figs. 4,6). The patterns of, or lack thereof, nDNA 

543 ancestral gene flow detected in the IMa2 analyses further support the 2° contact model. 

544 No ancestral gene flow was detected between S. spinosus and S. horridus common 

545 ancestors, but was detected between extant populations of S. spinosus and S. horridus 



546 (Table 4). Although a new study by Leache et al (2013a) found evidence for divergence 

547 with gene flow between S. horridus and S. spinosus, the method they used did not allow 

548 for discernment between models of secondary contact vs. divergence with gene flow. 

549 

Mito-nuclear Discordance in the S. spinosus Group 
550 Numerous studies have reported conflicting evolutionary histories between nuclear 



55i and mitochondrial genomes ( "mito-nuclear discordance" , reviewed in Toews & Brelsford 



552 2012). Out of 126 studies identified by Toews & Brelsford (2012) that documented 

553 strong incongruence between mt- and nDNA biogeographic patterns, the overwhelming 

554 majority of cases (97%) reported that the discordance likely arose from geographic 

555 isolation followed by secondary contact; the most common form of mito-nuclear 

556 discordance is due to the asymmetric movement of mtDNA between lineages. In the 
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557 case of the S. spinosus group, we can safely rule out the possibility that incomplete 

558 lineage sorting (ILS) of mtDNA alleles as the cause of mito-nuclear incongruence, as we 

559 would expect ILS to leave a geographically-independent genealogical signature. We 

560 cannot, however, rule out the possibility that adaptive introgression may be a factor, 

561 particularly because many of the individuals belonging to the " admixed" mtDNA clade 

562 were collected in moderately high elevation sites (>2000m) where individuals with 



563 particular mitochondrial haplotypes may be better adapted (e.g., Cheviron & Brumfield 



2009J). 

565 The most likely cause of mito-nuclear discordance in the S. spinosus group 

see appears to be due to unidirectional gene flow from southern S. spinosus and S. horridus 

567 into central S. spinosus. The admixed mtDNA clade is composed of central S. spinosus, 

ses southern S. spinosus, and southern S. horridus individuals (Figs. 1,4). Whereas 

569 southern S. horridus and S. spinosus individuals were recovered in other mitochondrial 

570 clades, all central S. spinosus individuals were confined to the admixed clade. This 

571 phylogenetic pattern intimates that the admixed mtDNA clade was originally composed 

572 of all central S. spinosus individuals, and recently, that southern S. horridus and S. 

573 spinosus males have introgressed their mtDNA copies into central S. spinosus females. 

574 Our gsi results support the notion of recent (mitochondrial) gene flow between southern 

575 S. horridus and central S. spinosus, but not between southern S. spinosus and central 

576 S. spinosus (Table 3). 

577 

Population Distinctiveness and Divergence Within the Sceloporus spinosus Group 
578 Previous authors have claimed that no clear boundaries exist between species, or 



579 even subspecies, within the S. spinosus group. For instance, Boyer et al (1987) 

580 concluded that S. s. spinosus and S. h. horridus were conspecific based on the overlap 

581 of femoral pores and contact frequency of supraocular-median head scales as a result of 



582 intergradation. Smith & Chiszar (1992) later returned S. spinosus and S. horridus to 

583 specific status after a reinterpretation of these individuals as intergrades between S. s. 

584 spinosus and S. s. apicalis. Distinguishing between S. spinosus subspecies is also 
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585 problematic due to the slight difference in average values of quantitative characters 



586 between subspecies, where Smith & Chiszar ( 1992 ) note that examination only of a 



587 series of six or more permits "reasonably secure identifications". Similar problems exist 



588 within S. horridus, where Frost (1978) reported a large area of intergrade in western 



[jMexico between S. h. albiventris and S. h. oligoporus. Notwithstanding, Lemos-Espinal| 



et al (2004) regarded these taxa as distinct species. In summary, distinguishing between 
what have been considered as distinct taxa in the S. spinosus group, is problematic, 
and little agreement exists between previous authors. 



Contrary to some previous research (Wiens & Reeder 1997; Smith 2001), the 



results of our nDNA-based phylogeny show the S. spinosus group to be monophyletic 
(to the exclusion of S. edwardtaylori; Fig. 4). Furthermore, S. spinosus and S. horridus 
are monophyletic with respect to each other, a result at odds with previous research 



597 (Wiens et al 2010). This discrepancy is certainly due to the overriding signal of the 



598 mtDNA in the combined mt- and nDNA analysis of Wiens et al (2010). Lineages are 



599 often determined to be distinct based on an assessment of gene flow levels, a test of the 



biological species concept (Mayr 1942 Mayr et al 1963). Our tests of nuclear gene flow 



in the S. spinosus group revealed gene flow not only between populations of each 
species, but also across species (Table 4). But, the level of gene flow we detected in all 
instances was far below 0.5 N e m, a value used by some when determining species limits 



Porter 


1990 


Hey 


2009) 



605 6-population model, should be cautioned because the size of our molecular dataset is 



(Hey 


2010 


Choi & Hey 


2011 



606 iiKeiy inadequate to generate accurate resu 

607 We based our species delimitation models on a combination of results from 

60s population assignment and migration analyses. In an attempt to account for gene flow, 

609 which has been show to severely affect parameter estimation in coalescent-based species 



tree analyses (Leache 2009 Leache et al 2013b), we excluded individuals located near 



population boundaries (Fig. 3). This of course assumes that the gene flow we detected 
occurred on population boundaries, an assumption which may not be true. Removing 
these peripheral individuals did not affect our species delimitation results that indicated 
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6u the presence of six independent lineages in the S. spinosus group. Even though the 



6i5 simulations results from Grummer et al (2014) showed the BFD method to be effective 



at selecting the true species history when compared to falsely "lumping" lineages, we 
interpret these results with caution. A few instances have been reported with this 



6i8 method that select the model with the largest number of lineages (e.g., Bryson Jr et al 



2014), indicating a potentially systematic problem with this method to delimit species. 



However, this needs to be explored with further simulations. 

When comparing gsi values between our coalescent simulations and empirical 
mtDNA, it appears that the empirical data for the southern S. horridus population are 
most in agreement with scenarios modeling recent, but not ancestral, gene flow across 
species (Scenarios C-E; Fig. 2; Table 3). On the other hand, the empirical data of the 
southernmost S. spinosus population are in agreement with a scenario in which there 
was either no gene flow, or ancestral gene flow between S. spinosus and S. horridus 
ancestors. The gsi simulation results did not reject any scenario for the central S. 
spinosus population (Table 3). Our conclusions based on the gsi results are directly a 
function of the levels of gene flow used in our simulations. We used a relatively high 
migration value of 1 N e m in our simulations (0.5 N e m unidirectionally from each 
population, where N e m = the product of the effective population size and the migration 
rate per generation), where some researchers consider a migration rate of N e m >0.5 



633 enough to keep populations from diverging (Porter 1990). We therefore believe that we 



have modeled a realistic level of gene flow to assess matrilineal-based migration in the 
S. spinosus group. 



Conclusions 

The number of plausible models that should be evaluated in phylogeographic 



studies is nearly infinite (e.g., Tsai & Carstens 2013 Pelletier & Carstens 2014). Here, 



we generated a small number of plausible models based on the results from our 
empirical data. Given our results, we conclude that i) six independent genetic lineages 
exist in the S. spinosus group, and identifying species is important for accurately 
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642 modeling evolutionary histories, ii) coalescent simulations reject a model of ancestral 

643 gene flow in the S. spinosus group, iii) the Bayesian phylogeographic reconstruction for 

644 the ancestral ranges of the S. spinosus group suggests that species within the group 

645 broadly overlapped throughout a majority of their evolutionary history (~3 million 

646 years), and iv) mitochondrial introgression is localized spatially, and likely temporally 

647 as well. The contrasting evolutionary histories of the nuclear and mitochondrial 

648 genomes seem to indicate another example of the mtDNA locus not accurately 

649 representing the true species-level evolutionary history. However, the mitochondrial 

650 genome has nonetheless provided a valuable piece of information in determining the 

651 evolutionary history of the S. spinosus group by presenting evidence for the timing and 

652 geographic extent of contact between distinct populations in this group. 
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Figure 1: Sampling localities and species/subspecies distributions of Sceloporus spinosus 
and 5*. horridus in Mexico (based on ( Smith| 1939 Frost 1978 Smith & Chiszar 1992). 
Sampling localities with bold rings indicate specimens that have been amplified for nDNA 
in addition to mtDNA, whereas samples with a light ring have only mtDNA. The des- 
ignation (i.e., color) of nDNA samples was based on Geneland assignments (3 inferred 
populations for each species), and the designation of mtDNA samples to subspecies was 
based on morphological characters. Also shown are the concatenated mt- and nDNA trees 
inferred from RAxML, where values at nodes represent bootstrap proportions. Note the 
mixed clade of S. horridus and S. spinosus in the mtDNA tree, in addition to the con- 
trasting phylogenetic placement of S. edwardtaylori between mt- and nDNA trees. 



38 



Downloaded from http://biorxiv.org/ on September 18, 2014 




9.75 N 



e) 



■ hor_N 

■ hor_C 
> hor_S 



pspin_C 
-spin_S 
— spin_N 

-ed 



2.25 NL 



2.25 N„ 



Figure 2: The six scenarios modeled for coalescent-based simulations. Migration times 
are indicated in coalescent units, and migration events are indicated by diagonal shad- 
ing. Northern, central, and southern populations are denoted by "N", "C", and "S", 
respectively 
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Figure 3: Geneland analysis results showing the number of populations and the proba- 
bility of individual assignment to each population. These results can be interpreted as 
topographic maps, where white colors indicate high probabilities of assignment to that 
cluster and red represents low assignment probability. Blue dashed lines indicate which 
samples were included in the "core" sampling for BFD analyses. Black numbers in the 
lower left portion of each tile are the number of individuals in that cluster, whereas the 
blue number represents the number of individuals from that cluster in the "core" sampling 
scheme. 
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0.004 



Figure 4: Map showing geographic locations of putatively admixed individuals in south- 
eastern Mexico along with their phylogenetic positions in n- and mtDNA trees. Stars 
without bold outlines indicate individuals without nDNA data, and black dots in the 
phylogenetic trees indicate bootstrap values >70. Note that groupings identified on the 
nDNA tree are based on population assignment analyses and therefore are not all mono- 
phyletic groups (e.g., central S. horridus and southern S. spinosus). 
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Figure 5: Gsi (genealogical sorting index) results for both simulated and empirical 
datasets along with the species tree topology used in the simulations. Histograms to 
the right indicate the distribution of gsi values recorded during simulations (see text for 
simulation details) for central Sceloporus spinosus, southern S. spinosus, and southern 
S. horridus. Y-axis values range from 0-3000, and x-axis values of the gsi statistic range 
from 0-1. Red arrows indicate the gsi value for the mtDNA empirical data. Figure (a) 
shows the gsi results for the model with no migration (Scenario A), (b) represents historic 
gene flow only (Scenario B), and (c) represents the gsi values for Scenario E that models 
recent gene flow (histograms for Scenarios C,D looked nearly identical). 
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(f) 



Figure 6: Bayesian phylogeographic results under the relaxed random walk (RRW) species 
tree diffusion approach. Distributions indicate the 80% HPD location of the depicted 
lineages from 3.0 (a) to 0.5 mya (f) using the "time slice" feature in SPREAD. 
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Table 1: Information for the genetic data gathered in this study. The first three regions are 
mitochondrial regions, whereas the remainder are nuclear regions. Gene region "Type" 
abbreviations indicate noncoding (NC), protein-coding (PC), intron (I), and anonymous 
(A). 
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Table 2: Results from STRUCTURAMA indicating the posterior probability values when 
the prior mean on the number of populations (k) was varied. Values shown are the average 
of four independent runs. Bold values indicate the highest posterior probability for each 
prior mean on k. 



Number of 



Populations Prior Mean on Number of Populations (k) 





l 1 


2 


3 


4 


5 


6 


7 


8 


9 


10 


1 


0.00 
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0.01 



1 Results from the analysis with a k prior of 1 were unstable and reported a poste- 
rior probability of 1.0 for 68 populations 
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Table 3: Results from Bayes Factor Delimitation of species (BFD) analyses. Path sam- 
pling ("PS") and stepping stone marginal likelihood estimates were very similar, so we 
only show the PS results here. See Materials and Methods section for the composition of 
each species delimitation model. 



All Samples "Core" Samples 



Model 


# Species 


PS 


Bayes Factor 


PS 


Bayes Factor 


6 pop 
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-12854 




-11353 




southern spinosus migration 
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-12889 


71 


-11396 


86 


northern horridus migration 
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-12971 


234 


-11425 


144 


southern horridus migration 


5 


-12978 


248 


-11428 


151 


all horridus migration 
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-13181 


654 


-11536 


367 


2 pop 
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1129 
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Table 4: Gsi values for both empirical (mtDNA) and simulated datasets for all scenarios 
modeled (see Fig. 2). Northern, central, and southern populations are denoted by "N", 
"C" , and "S" , respectively. Numbers in parentheses indicate the frequency of simulation 
results more extreme than the empirical gsi value. 





Empirical 


Simulations 


Lineage 


mtDNA Scenario A 


Scenario B Scenario C Scenario D Scenario E 



N homdus 0.90 0.81 0.76 0.76 0.77 0.81 

C homdus 0.37 0.78 0.74 0.74 0.74 0.78 

S homdus 0.54 0.82 (0.0002) 0.76 (0.003) 0.50 (0.248) 0.50 (0.262) 0.50 (0.245) 

N spmosus 1.00 0.93 0.91 0.90 0.91 0.93 

C spmosus 0.53 0.73 (0.045) 0.67 (0.216) 0.39 (0.059) 0.39 (0.063) 0.39 (0.072) 

S spmosus 0.97 0.66 (0.289) 0.63 (0.286) 0.27 (0.010) 0.27 (0.011) 0.27 (0.011) 
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Table 5: Significant results from the isolation- migration (IMa2) analyses. Values given 
are in 2Nm, and N/As indicate that no significant migration estimates were reported 
for that model (e.g., 3-population or 6-population model). Northern, central, and south- 
ern populations are denoted by "N", "C", and "S", respectively. Common ancestors of 
two lineages are indicated with an underscore (_) between daughter lineage population 
numbers (e.g., horridus N_C is the common ancestor of northern and central S. horridus 
populations). Asterisks indicate significance levels for the Nielsen & Wakeley (2001) test. 



Lineage 


3-population 


6-population 




Models 


Model 


Extant 


horridus N — > horridus S 


0.i32***i 


0.67*** 


horridus S — > horridus C 


0.124* 


N/A 


spinosus C — >spinosus S 


0.197* 


N/A 


horridus S — > spinosus N 


N/A 


0.024* 


Ancestral 


horridus N_C — > horridus S 


3.537* 


N/A 


spinosus N — > spinosus C_S 


0.365* 


N/A 



1 *p<0.05; ***p<0.001 
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